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ABSTRACT 

With the success of Web applications, most of our data is 
now stored on various third-party servers where they are pro- 
cessed to deliver personalized services. Naturally we must 
be authenticated to access this personal information, but 
the use of personalized services only restricted by identifi- 
cation could indirectly and silently leak sensitive data. We 
analyzed Google Web Search access mechanisms and found 
that the current policy applied to session cookies could be 
used to retrieve users' personal data. We describe an at- 
tack scheme leveraging the search personalization (based on 
the same SID cookie) to retrieve a part of the victim's click 
history and even some of her contacts. We implemented a 
proof of concept of this attack on Firefox and Chrome Web 
browsers and conducted an experiment with ten volunteers. 
Thanks to this prototype we were able to recover up to 80% 
of the user's search click history. 

1. INTRODUCTION 

Over the last few years, Google's core service "Search" was 
enhanced through feature deployments, new UI and display 
optimization. One major improvement regarding the qual- 
ity of search results was the personalization of the ranking 
algorithms. Personalized search results are ranked according 
to the user's context (i.e. localization and language), profile 
(search history), social networks and other characteristics 
extracted from the use of Google's services. Via the Google 
Dashboard, users can view and possibly edit data that was 
collected by their use of Google services. These data may be 
very sensitive, so access to this interface naturally requires 
an authentication. 

However, while the direct access to users' data is sub- 
ject to a strict security policy, using personalized services 
(which may leak this same personal information) is not. In- 
deed, some Web applications like Google Search only verify 
the (unsecured) user's session to render personalization fea- 
tures. Such a session can be hijacked by simply capturing 
the corresponding "sm cookie". Unlike cookies used to au- 
thenticate the user, the Sid cookie may be sent cleartext, 



i.e. unprotected. Furthermore, this cookie is sent when- 
ever the user accesses to a service hosted on google . com, 
increasing attack opportunities. In this paper we show how 
the SID cookie could be misused by an attacker, providing 
ungranted access to Google Search personalized results and 
history. 

We study an information leakage attack that exploits 
the current Google's access policy regarding personalized 
services {i.e. unauthenticated access). More specifically, 
we hijack a SID cookie to circumvent Google protection and 
access a user's personal data who — possibly forced by the 
attacker — transmitted her cookie in clear text. We empha- 
size the risk of using unauthenticated personalized services 
over a shared network with the following contributions. 



1. The description of an information leakage attack that 
uses the unprotected SID cookie to retrieve the victim's 
visited search results and a list of her contacts. 

2. A proof of concept, based on the browser extension 
"Firesheep", and the tool we used to evaluate the im- 
pact of the information leakage. 

The remaining of this paper is organized as follows. Section 
2 describes Google's architecture and services relevant to the 
understanding of the proposed attacks. Section 3 presents 
the information leakage attack using the SID cookie to re- 
trieves user's click history and contacts. An implementation 
of this attack is proposed in Section 4 along with statis- 
tics showing how seriously a Google account's click history 
could be compromised. Finally section 5 discusses measures 
that should be deployed to counter this attack and section 
6 concludes this paper. 

2. GOOGLE SERVICES AND COOKIES 

Google provides more than twenty different services, cov- 
ering most of people needs over the Web. With a single 
Google account, a user can access to all these applications 
even those hosted on different domains (e.g. YouTube or 
Blogger). A couple of cookies are used to help users navi- 
gating smoothly between the Google services. 



2.1 Google.com cookies 

At least three cookies are systematically sent by the 
user's browser to Google servers when accessing a service 
under the google . com domain. 



• pref: this cookie carries the preferences for the browser 
currently accessing the service. These preferences re- 
fer to the interface, the language, the number of results 
returned by a search, etc. The pref cookie is attached 
to a browser and not bound to a specific user account. 

• SID: this non-secured session cookie is transmitted to 
Google servers to identify the user and personalize the 
provided services. In the particular case of Search, this 
identification will trigger result personalization. Even 
if the user is not logged, best effort personalization of 
search results can be performed based on the recent 
browser activity. 

• SSID: this secured session cookie provides access to ser- 
vices that contains user data and personal information 
like Gmail, Google Calendar, Google Contacts, etc. 

It is our understanding that unlike the SSID cookie, the SID 
cookie just has an identification purpose and can not be used 
to authenticate a user. On the other hand, the SSID cookie is 
sent only over encrypted connections and is required for ser- 
vices providing access to users' data and personal informa- 
tion. Google is thus implementing a two-level cookie based 
access policy; the first level only requires user identification 
while accessing to the second level assumes user authentica- 
tion. 

In our study, we focus on the SID cookie and the infor- 
mation leakage that results of subsequent service personal- 
ization. 

2.2 Setting up the Attack 

The SID cookie is valid over the entire (*. google . com) do- 
main, so it is sent to every Web application hosted on Google 
(* .google . com). Some services, such as Gmail, are only 
available via a secured https connection, whereas others 
can be accessed through clear connection (http) that is the 
case of Google Search and other services listed in Table 1. 

The SID cookie is sent every time a request to a ser- 
vice under the (google.com) domain is sent, even when the 
queried page cannot be personalized (e.g: privacy policy, 
terms of services). Because this cookie is sent to many URLs, 
it is enough for one of these URLs not to be accessible through 
https to be able to compromise it. 

2.2. 1 Bypassing HTTPS enforcement policy 
HTTPS-Everywhere [3] is a browser extension that, when 
available, redirects a user to the secured version of the re- 
quested service to prevent traffic interception. Unfortu- 
nately this approach suffers from several drawbacks: 

• First, not every service is yet available through https. 
For instance, Google Alerts remains only accessible 
through HTTP. 

• Second, the list of services available through https has 
to be maintained. Some services are already available 
in https but not yet redirected. A list of such services 
is reported in Table 1. 

• Finally, some services are redirected while not yet avail- 
able. As a result these services can not be reached by 
HTTPS-Everywhere users. 



Due to these flaws, even HTTPS-Everywhere users could be 
redirected to a URL where they would have to send their 
SID cookies in cleartext. 

2.2.2 Intercepting the sid cookie 

Whenever a user accesses to one of the listed URL, her vul- 
nerable SID cookie is exposed and so is her personal informa- 
tion. The objective of the attacker is to force the victim to 
exchange this cookie with unsecured services over a shared 
network, and so easily capture the cookie. Here are some 
examples. 

• The attacker can setup an open access point with a 
name (ssid) corresponding to a local hot-spot (e.g. 
fast-food restaurant name) and include in the welcome 
page an hidden iframe pointing to an unsecured Google 
service. When connected, the browser will send to the 
rogue access point the valid SID cookie in clear text. 

• In an open wireless network, the attacker could also 
spoof the access point's physical address (mac) and 
respond to a victim's HTTP GET request by any Web 
page including the hidden iframe just like in the pre- 
vious case. 

2.2.3 Googling for Cookies 

The simpler solution to find SID cookies is to search them. 
Typing the query "pref=id= sid= google" in Google provides 
a list of pages where people published captured HTTP traf- 
fic, including SID cookies. Using the "Past Month" search 
filter increases the chances of retrieving valid SID cookies. 
Not all these results contain full SID cookies and some of 
the listed SID cookies may have already expired, but this 
simple search should already provide many valid cookies. 

3. PERSONALIZED SEARCH ATTACK 

The search results provided to users who enabled Google 
Web Search History are personalized and colored based on 
their previous interactions. One advantage of this feature is 
the ability to see the websites a user previously visited in 
the Google's search results. Furthermore, frequently visited 
pages are more likely to be high ranked. In 2009, Google also 
started to consider social network indicators as part of the 
inputs used to improve the search algorithm. "Social Search" 
now up-ranks results that user's contacts shared publicly via 
Google Buzz, Twitter and other social networks. 

The personalized search algorithm is based on private 
data held in the user's account. As such, it can be considered 
as a controlled information leakage. In [2], authors suggest 
that this information leakage can be used to know some of 
the results a user clicked on. This flaw has not yet been fixed 
as considered hard to exploit: an attacker must know what 
the victim searched for and then compare the personalized 
and un-personalized results. Similarly, the visited-link col- 
oration feature/vulnerability [2] was not addressed as con- 
sidered innocuous regarding the benefits it offered. 

However, since these flaws have been reported, Google 
introduced new features that could improve the efficiency 
of attacks based on this information leakage. This section 
shows how these features could be misused to compromise 
users' click history. 



URL not available in HTTPS 



URL not redirected to HHTPS 



Specialties 


Services 


googlc.com/blogscarch 
google.com/dirhp 
googlc.com/alerts 
google .com /mobile 
google .com /nexus 
googlc.com/analytics 
googlc.com/postini 
googlc.com/chromc 
google . com /wallet 
googlc.com/ads/preference 
googlc.com/baraza 
google .com /imghp 


picasa.googlc.com 

maps.google.com 

knol.google.com/k 

webaccelerat or. google, com 

skctchup.googlc.com 

books.google.com 

video.google.com 

scholar.google.com 

gears.google.com 



Specialties 


Services 


googlc.com/sitcscarch 
google.com/transparencyreport 
google.com/adsensc/support 
googlc.com / insights /search 
google. com/prdhp 
google.com/appsstatus 
google . com/ chromeb ook 
googlc.com/patents 


invcstor.googlc.com 
dcsktop.googlc.com 



Table 1: List of URLs that require the victim to send cleartext her cookie 



3.1 Google Search filters 

In this section we list the features that made the informa- 
tion leakage attack more critical. These features - available 
when clicking on " Show search tools" - could be misused to 
significantly reduce the number of queries one has to issue to 
retrieve the previously clicked search results (and so harder 
to detect). 

3.1.1 Visited Results 

Collecting a user's visited links via random regular searches 
may take a very long time as the result pages mix visited 
and unvisited hyperlinks. However, an update of the Google 
Search interface introduced new filters. For instance, the 
"Visited" one filters out unvisited results 1 . By enabling this 
filter, the information leakage becomes critical: only visited 
links remain listed as search results, partly disclosing the 
user's click history. 

3.1.2 Social 

With the "Social" filter, pages commented, twitted or shared 
via a social platform are displayed. The user's social net- 
work is built up according to his Gmail contacts and pos- 
sibly other social applications such as Twitter, Livejournal, 
etc (once linked to his Google account). This filter does not 
only provide the list of shared links, but also the connections 
that exist between the user and his social peers. Moreover, 
the user's Gmail contacts that should be treated as strictly 
private data is exposed and so could be used for blackmail- 
ing, phishing or spamming attacks. While we did not inves- 
tigated how many contacts this attack may compromise, it 
appears that people who share a lot of information or belong 
to a large social network, are very likely to be listed in the 
"Social" search result page. 

In addition to the list of Gmail contact, Google+ users 
will also receive links that have been shared by people in 
their Circles whether or not these contacts belong to ex- 
posed Circles. Consequently, information about the victim 
private Circles could also be leaked and retrieved through 
the "Social" filter. 

3.2 Capturing the victim's data 

The sole prerequisite to run this attack is to capture the 
SID cookie of a user with Web Search History enabled. SID 

x To visit the result page containing only 
visited links, one can directly go to 
http://www. google, com/search? q=.com&tbo=l&tbs=whv:l 



cookies are not marked "secured" and so are usually sent 
cleartext. If the victim uses a secure connection (https), 
the attacker could use the iframe injection described in Sec- 
tion 2 to force her to send the SID cookie cleartext. Once 
the cookie has been intercepted, the attack is launched by 
opening a window on Google Search with the "Visited" filter 
enabled. 

To start the attack, we use a list of 15 terms composed 
of domain extension (. com, .net, . org, .us, . edu, . f r, . co . ), 
words and acronyms likely to appear in URLs (. jsp, .asp, 
php, html, index, www) and popular websites (google, facebook). 
To maximize the chance of finding all the visited links, the 
program should not only parse the first page of results, but 
browse all of them. We implemented a prototype that parses 
the result page and then clicks on "Next" to display and 
parse the following and so on until the last page. With 
Google Instant disabled - the attacker can set his Google 
pref cookie to display 100 results per page instead of 10. 
Because the pref cookie is browser specific - and not linked 
to the currently used Google Account - the attacker search 
preferences won't be mirrored on the victim browser. The 
list of retrievable clicked URLs can therefore be browsed 
very quickly. However, this attack is destructive; as the pro- 
gram interacts with the result pages, queries entered during 
the attack will appear in the victim's Web Search History. 

We conducted an experiment we analyze in section 4 
and which shows that on average, 40% of the click history 
can be retrieved. For Google users who search only occa- 
sionally from their account, up to 80% of the click history 
could be retrieved using this method. 

4. IMPLEMENTATION AND EVALUATION 

In order to validate the applicability of our attack and to 
illustrate its easy deployment, we implemented a proof of 
concept as a Firefox extension and another extension to 
measure the number of visited links that were retrievable. 
Both tools are available online (see http: //unsearcher . 
org/sid-test) and we describe them in this section. We 
then detail and analyze our experiment results. 
A chrome extension has also been developped to quickly set 
the SID cookie for the google . com domain (this extension 
is also available online http://unsearcher.org) 

4.1 Extending Firesheep 

In October 2010, the Firefox extension Firesheep [1] was re- 
leased and emphasized the simplicity of session hijacking. 
This extension monitors network interfaces to capture cook- 



ies corresponding to sessions established on popular Web 
Service websites like Facebook, Google and Twitter. Once 
a cookie is captured, the extension provides an access to the 
account related to the hijacked session. 

We extended Firesheep to implement our information 
leakage attack. Thanks to the Firehseep modularity, we eas- 
ily added a module that performs the attack on the session 
hijacked by the original code. 

As a result, when a Google SID cookie is captured, the 
account name appears in the Firesheep sidebar. Double 
clicking on it starts the attack; double clicking again dis- 
plays the retrieved list of visited links. 

4.2 Measurement methodology 

We asked ten users to run the experiment on their 
Google accounts, we provided them with an extension that 
extracts from a user Web Search History the clicks recorded 
since 1st January 2011 and then issues some queries from 
the user's account with the "Visited" filter activated. The 
extension — developed for this experiment — and the cor- 
responding instructions have been publicly released 2 to let 
users evaluate which portion of their click history can be 
exposed. Notice that, in order to preserve privacy of our 
testers, we asked them to send us only the ratio of clicks the 
attack was able to retrieve. 

It is worth noticing that, among the 10 volunteers who 
all had Web Search History enabled, 6 were not aware that 
the service existed and never used it (on purpose). It has 
been estimated that 50% of Google accounts have Web Search 
History enabled [2], but many may not be aware that the 
service is enabled, as the opt-out option has not always been 
obvious [5]. 

4.3 Result Analysis 

We summarize the results of our experiments in Table 2. We 
simulated the attack on a set of ten volunteers who visited 
between 88 and 3059 search results between January and 
July 2011. For these users, between 72 and 467 links were 
retrieved counting for 82% to 11% of the links recorded in 
their Web Search History over the considered period. We 
also recorded the total number of links that the attack re- 
trieved independently of the date they were visited. 

The attack provides similar results for the three users 
with the more visited links in their search history. For these 
three users who visited more links (Users 8, 9 and 10), a 
similar number of visited links were retrieved (between 628 
and 644) although the number of clicks in the Web Search 
History varies from 1340 to 3059. 

Figure 1 depicts the ratio of retrieved query as a func- 
tion of the number of query submitted. We decided to use 
suffix and file extensions in our query list as they are very 
likely to appear in the URL of visited links and should re- 
turn many search results. For instance, the query ".com" 
will return the list of visited links with the ".com" suffix. 

The objective is to recover a large portion of the vic- 
tim's visited search results with the smallest set of search 
queries. Limiting the number of queries that will appear in 
the Web Search History reduces the risk that the victim de- 
tects the attack. A detected attack is likely to result in the 
purge of the victim's Web Search History and would prevent 

2 The extension is available at http://unsearcher.org/ 
Test'/,20Flaw/ ad@monitor . xpi 



any further exploitation of the SID cookie by the attacker. 

Four queries — with no associated visited search results 
— are likely to remain unnoticed in victims' Web Search His- 
tory and are enough to retrieve a large part of the clicked 
results. For all users, the attack submitted the 15 queries in 
less than 5 minutes. 
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Figure 1: Destructive Approach: Result for the 10 
users 



5. COUNTERMEASURE 

The information leakage attack we described just require 
to read victims' SID cookies. If a user is not logged on her 
Google account when she accesses to Google services from a 
shared network, her SID cookie can not be compromised. A 
solution is therefore to sign out from Google accounts when 
connecting from a shared network or to use a VPN to encrypt 
the traffic and prevent cookie interception. 

From a user perspective a solution to prevent this sec- 
ond attack is to purge the Web Search History and to disable 
temporarily this feature. Such radical solution would defi- 
nitely prevent the leakage of information about the user's 
search history but would not prevent a list of Gmail and 
Google+ contacts to be exposed by "Social Search". 

Another solution is to disable the "visited" and "social" 
search filters when a user is not visiting the secured version 
of Google. While this - most likely -temporary solution 
does not fist the information leakage, it makes its exploita- 
tion more complicated and detectable. 

6. CONCLUSION 

We presented an information leakage attack that leverages 
Google two-level cookie based access policy. We described an 
information leakage attack, implemented a proof of concept 
and evaluated the number of links visited over the last six 
months that could be exposed. Both issues should soon be 
fixed and we describe measures users could take to protect 
their search history from this kind of attack. 

Nevertheless, some issues can not be addressed by users 
and require a modification of Google's cookie policy. As 
Google is taking steps to include social indicators in result 
personalization, user's social network could soon be exposed. 

7. REFERENCES 

[1] Eric Butler. Firesheep, 2010. 

http : / / codebutler . com/f iresheep. 



USER 


Ul 


U2 


U3 


U4 


U5 


U6 


U7 


U8 


U9 


U10 


Avg 


Links (since Januray 2011) 


88 


111 


211 


426 


625 


812 


1148 


1340 


2521 


3059 


1034 


Found (since Januray 2011) 


72 
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Table 2: Summary of the experiment results 
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